Visual Inertial Odometry (VIO) is one of the most established state estimation methods for mobile platforms. However, when visual tracking fails, VIO algorithms quickly diverge due to rapid error accumulation during inertial data integration. This error is typically modeled as a combination of additive Gaussian noise and a slowly changing bias which evolves as a random walk. In this work, we propose to train a neural network to learn the true bias evolution. We implement and compare two common sequential deep learning architectures: LSTMs and Transformers. Our approach follows from recent learning-based inertial estimators, but, instead of learning a motion model, we target IMU bias explicitly, which allows us to generalize to locomotion patterns unseen in training. We show that our proposed method improves state estimation in visually challenging situations across a wide range of motions by quadrupedal robots, walking humans, and drones. Our experiments show an average 15% reduction in drift rate, with much larger reductions when there is total vision failure. Importantly, we also demonstrate that models trained with one locomotion pattern (human walking) can be applied to another (quadruped robot trotting) without retraining.
translated by 谷歌翻译
本文介绍了Cerberus机器人系统系统,该系统赢得了DARPA Subterranean挑战最终活动。出席机器人自主权。由于其几何复杂性,降解的感知条件以及缺乏GPS支持,严峻的导航条件和拒绝通信,地下设置使自动操作变得特别要求。为了应对这一挑战,我们开发了Cerberus系统,该系统利用了腿部和飞行机器人的协同作用,再加上可靠的控制,尤其是为了克服危险的地形,多模式和多机器人感知,以在传感器退化,以及在传感器退化的条件下进行映射以及映射通过统一的探索路径计划和本地运动计划,反映机器人特定限制的弹性自主权。 Cerberus基于其探索各种地下环境及其高级指挥和控制的能力,表现出有效的探索,对感兴趣的对象的可靠检测以及准确的映射。在本文中,我们报告了DARPA地下挑战赛的初步奔跑和最终奖项的结果,并讨论了为社区带来利益的教训所面临的亮点和挑战。
translated by 谷歌翻译
本文介绍了一种基于来自IMU数据的学习的位移测量的腿机器人的新型概述状态估计。最近的行人跟踪研究表明,可以使用卷积神经网络从惯性数据推断出运动。学习的惯性位移测量可以提高具有挑战性的场景的状态估计,其中腿部内径是不可靠的,例如滑动和可压缩的地形。我们的工作学会从IMU数据估算从IMU数据融合的位移测量,然后与传统的腿部腿部融合。我们的方法大大降低了诸如在视觉中部署的腿部机器人和Lidar被否定的环境(如有雾的下水道或尘土飞扬的地雷)至关重要。我们使用来自几个真正的机器人实验的数据与交叉挑战性地形的几个真正的机器人实验进行了比较了来自EKF和增量固定滞后因子图估计的结果。与传统的运动惯用估计器相比,我们的结果在挑战情景中表明相对姿势误差的减少37%,而无需学习测量。当在视觉降级环境中的视觉系统中使用时,我们还展示了22%的误差减少,例如地下矿井。
translated by 谷歌翻译
Purpose: Tracking the 3D motion of the surgical tool and the patient anatomy is a fundamental requirement for computer-assisted skull-base surgery. The estimated motion can be used both for intra-operative guidance and for downstream skill analysis. Recovering such motion solely from surgical videos is desirable, as it is compliant with current clinical workflows and instrumentation. Methods: We present Tracker of Anatomy and Tool (TAToo). TAToo jointly tracks the rigid 3D motion of patient skull and surgical drill from stereo microscopic videos. TAToo estimates motion via an iterative optimization process in an end-to-end differentiable form. For robust tracking performance, TAToo adopts a probabilistic formulation and enforces geometric constraints on the object level. Results: We validate TAToo on both simulation data, where ground truth motion is available, as well as on anthropomorphic phantom data, where optical tracking provides a strong baseline. We report sub-millimeter and millimeter inter-frame tracking accuracy for skull and drill, respectively, with rotation errors below 1{\deg}. We further illustrate how TAToo may be used in a surgical navigation setting. Conclusion: We present TAToo, which simultaneously tracks the surgical tool and the patient anatomy in skull-base surgery. TAToo directly predicts the motion from surgical videos, without the need of any markers. Our results show that the performance of TAToo compares favorably to competing approaches. Future work will include fine-tuning of our depth network to reach a 1 mm clinical accuracy goal desired for surgical applications in the skull base.
translated by 谷歌翻译
We present temporally layered architecture (TLA), a biologically inspired system for temporally adaptive distributed control. TLA layers a fast and a slow controller together to achieve temporal abstraction that allows each layer to focus on a different time-scale. Our design is biologically inspired and draws on the architecture of the human brain which executes actions at different timescales depending on the environment's demands. Such distributed control design is widespread across biological systems because it increases survivability and accuracy in certain and uncertain environments. We demonstrate that TLA can provide many advantages over existing approaches, including persistent exploration, adaptive control, explainable temporal behavior, compute efficiency and distributed control. We present two different algorithms for training TLA: (a) Closed-loop control, where the fast controller is trained over a pre-trained slow controller, allowing better exploration for the fast controller and closed-loop control where the fast controller decides whether to "act-or-not" at each timestep; and (b) Partially open loop control, where the slow controller is trained over a pre-trained fast controller, allowing for open loop-control where the slow controller picks a temporally extended action or defers the next n-actions to the fast controller. We evaluated our method on a suite of continuous control tasks and demonstrate the advantages of TLA over several strong baselines.
translated by 谷歌翻译
The xView2 competition and xBD dataset spurred significant advancements in overhead building damage detection, but the competition's pixel level scoring can lead to reduced solution performance in areas with tight clusters of buildings or uninformative context. We seek to advance automatic building damage assessment for disaster relief by proposing an auxiliary challenge to the original xView2 competition. This new challenge involves a new dataset and metrics indicating solution performance when damage is more local and limited than in xBD. Our challenge measures a network's ability to identify individual buildings and their damage level without excessive reliance on the buildings' surroundings. Methods that succeed on this challenge will provide more fine-grained, precise damage information than original xView2 solutions. The best-performing xView2 networks' performances dropped noticeably in our new limited/local damage detection task. The common causes of failure observed are that (1) building objects and their classifications are not separated well, and (2) when they are, the classification is strongly biased by surrounding buildings and other damage context. Thus, we release our augmented version of the dataset with additional object-level scoring metrics https://gitlab.kitware.com/dennis.melamed/xfbd to test independence and separability of building objects, alongside the pixel-level performance metrics of the original competition. We also experiment with new baseline models which improve independence and separability of building damage predictions. Our results indicate that building damage detection is not a fully-solved problem, and we invite others to use and build on our dataset augmentations and metrics.
translated by 谷歌翻译
Knowledge of the symmetries of reinforcement learning (RL) systems can be used to create compressed and semantically meaningful representations of a low-level state space. We present a method of automatically detecting RL symmetries directly from raw trajectory data without requiring active control of the system. Our method generates candidate symmetries and trains a recurrent neural network (RNN) to discriminate between the original trajectories and the transformed trajectories for each candidate symmetry. The RNN discriminator's accuracy for each candidate reveals how symmetric the system is under that transformation. This information can be used to create high-level representations that are invariant to all symmetries on a dataset level and to communicate properties of the RL behavior to users. We show in experiments on two simulated RL use cases (a pusher robot and a UAV flying in wind) that our method can determine the symmetries underlying both the environment physics and the trained RL policy.
translated by 谷歌翻译
Hidden parameters are latent variables in reinforcement learning (RL) environments that are constant over the course of a trajectory. Understanding what, if any, hidden parameters affect a particular environment can aid both the development and appropriate usage of RL systems. We present an unsupervised method to map RL trajectories into a feature space where distance represents the relative difference in system behavior due to hidden parameters. Our approach disentangles the effects of hidden parameters by leveraging a recurrent neural network (RNN) world model as used in model-based RL. First, we alter the standard world model training algorithm to isolate the hidden parameter information in the world model memory. Then, we use a metric learning approach to map the RNN memory into a space with a distance metric approximating a bisimulation metric with respect to the hidden parameters. The resulting disentangled feature space can be used to meaningfully relate trajectories to each other and analyze the hidden parameter. We demonstrate our approach on four hidden parameters across three RL environments. Finally we present two methods to help identify and understand the effects of hidden parameters on systems.
translated by 谷歌翻译
Principal Component Analysis (PCA) and its exponential family extensions have three components: observations, latents and parameters of a linear transformation. We consider a generalised setting where the canonical parameters of the exponential family are a nonlinear transformation of the latents. We show explicit relationships between particular neural network architectures and the corresponding statistical models. We find that deep equilibrium models -- a recently introduced class of implicit neural networks -- solve maximum a-posteriori (MAP) estimates for the latents and parameters of the transformation. Our analysis provides a systematic way to relate activation functions, dropout, and layer structure, to statistical assumptions about the observations, thus providing foundational principles for unsupervised DEQs. For hierarchical latents, individual neurons can be interpreted as nodes in a deep graphical model. Our DEQ feature maps are end-to-end differentiable, enabling fine-tuning for downstream tasks.
translated by 谷歌翻译
通过将从地面视图摄像头拍摄到从卫星或飞机上拍摄的架空图像的图像,通过将代理定位在搜索区域内,将代理定位在搜索区域内,将代理定位在搜索区域中。尽管地面图像和架空图像之间的观点差异使得跨视图地理定位具有挑战性,但假设地面代理可以使用全景相机,则取得了重大进展。例如,我们先前的工作(WAG)引入了搜索区域离散化,训练损失和粒子过滤器加权的变化,从而实现了城市规模的全景跨视图地理定位。但是,由于其复杂性和成本,全景相机并未在现有机器人平台中广泛使用。非Panoramic跨视图地理定位更适用于机器人技术,但也更具挑战性。本文介绍了受限的FOV广泛地理定位(Rewag),这是一种跨视图地理定位方法,通过创建姿势吸引的嵌入并提供将粒子姿势纳入暹罗网络,将其概括为与标准的非填充地面摄像机一起使用,以供与标准的非卧型地面摄像机一起使用。 Rewag是一种神经网络和粒子滤波器系统,能够在GPS下的环境中全球定位移动代理,仅具有探测仪和90度FOV摄像机,其本地化精度与使用全景相机实现并提高本地化精度相似的定位精度与基线视觉变压器(VIT)方法相比,100倍。一个视频亮点,该视频亮点在https://youtu.be/u_obqrt8qce上展示了几十公里的测试路径上的收敛。
translated by 谷歌翻译